118 research outputs found
On the role of pre and post-processing in environmental data mining
The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed
Data mining as a tool for environmental scientists
Over recent years a huge library of data mining algorithms has been developed to tackle a variety of problems in fields such as medical imaging and network traffic analysis. Many of these techniques are far more flexible than more classical modelling approaches and could be usefully applied to data-rich environmental problems. Certain techniques such as Artificial Neural Networks, Clustering, Case-Based Reasoning and more recently Bayesian Decision Networks have found application in environmental modelling while other methods, for example classification and association rule extraction, have not yet been taken up on any wide scale. We propose that these and other data mining techniques could be usefully applied to difficult problems in the field. This paper introduces several data mining concepts and briefly discusses their application to environmental modelling, where data may be sparse, incomplete, or heterogenous
Crossing the death valley to transfer environmental decision support systems to the water market
Environmental decision support systems (EDSSs) are attractive tools to cope with the complexity of environmental global challenges. Several thoughtful reviews have analyzed EDSSs to identify the key challenges and best practices for their development. One of the major criticisms is that a wide and generalized use of deployed EDSSs has not been observed. The paper briefly describes and compares four case studies of EDSSs applied to the water domain, where the key aspects involved in the initial conception and the use and transfer evolution that determine the final success or failure of these tools (i.e., market uptake) are identified. Those aspects that contribute to bridging the gap between the EDSS science and the EDSS market are highlighted in the manuscript. Experience suggests that the construction of a successful EDSS should focus significant efforts on crossing the death-valley toward a general use implementation by society (the market) rather than on development.The authors would like to thank the Catalan Water Agency (Agència Catalana de l’Aigua), Besòs River Basin Regional Administration
(Consorci per la Defensa de la Conca del Riu Besòs), SISLtech, and Spanish Ministry of Science and Innovation for providing funding
(CTM2012-38314-C02-01 and CTM2015-66892-R). LEQUIA, KEMLG, and
ICRA were recognized as consolidated research groups by the Catalan
Government under the codes 2014-SGR-1168, 2013-SGR-1304 and
2014-SGR-291.Peer ReviewedPostprint (published version
Analysing similarity assessment in feature-vector case representations
Case-Based Reasoning (CBR) is a good technique to solve new problems based in previous experience. Main assumption in CBR relies in the hypothesis that similar problems should have similar solutions. CBR systems retrieve the most similar cases or experiences among those stored in the Case Base. Then, previous solutions given to these most similar past-solved cases can be adapted to fit new solutions for new cases or problems in a particular domain, instead of derive them from scratch. Thus, similarity measures are key elements in obtaining reliable similar cases, which will be used to derive solutions for new cases. This paper describes a comparative analysis of several commonly used similarity measures, including a measure previously developed by the authors, and a study on its performance in the CBR retrieval step for feature-vector case representations. The testing has been done using six-teen data sets from the UCI Machine Learning Database Repository, plus two complex environmental databases.Postprint (published version
Using entropy-based local weighting to improve similarity assessment
This paper enhances and analyses the power of local weighted similarity measures. The paper proposes a new entropy-based local weighting algorithm to be used in similarity assessment to improve the performance of the CBR retrieval task. It has been carried out a comparative analysis of the performance of unweighted similarity measures, global weighted similarity measures, and local weighting similarity measures. The testing has been done using several similarity measures, and some data sets from the UCI Machine Learning Database Repository and other environmental databases.Postprint (published version
Addressing the evaluation of EDSS-maintenance
Daily operation and maintenance tasks are needed to guarantee the correct performance of constructed wetlands. The definition of these activities is a complex task since these actions vary according to the characteristics of each facility. To support the definition of these operation and maintenance protocols an Environmental Decision Support System (EDSS) has been constructed (EDSS-maintenance). The methodology used to develop EDSS-maintenance is based on the following five steps: environmental problem analysis, data and knowledge acquisition, model selection, model implementation and evaluation process. The first four steps have been finished; however, the evaluation process is ongoing. This document presents a new approach for this step: two numerical indices allow (a) verifying the performance of the EDSS-maintenance and (b) validating the compliance of the protocols with the user requirements. Moreover, another index enables an easy revision and improvement of the knowledge bases (problems, causes and
actions) and so enhances the decision support system.Postprint (published version
Modeling the input-output behaviour of wastewater treatment plants using soft computing techniques
Wastewater Treatment Plants (WWTPs) control and prediction under a wide range of operating conditions is an important goal in order to avoid breaking of environmental balance, keep the system in stable operating conditions and suitable decision-making. In this respect, the availability of models characterizing WWTP behaviour as a dynamic system, is a necessary first step. However, due to the high complexity of the WWTP processes and the heterogeneity, incompleteness and imprecision of WWTP data, finding suitable models poses substantial problems. In this paper, an approach via soft computing techniques is sought, in particular, by experimenting with fuzzy heterogeneous time-delay neural networks to characterize the time variation of outgoing variables. Experimental results show that these networks are able to characterize WWTP behaviour in a statistically satisfactory sense and also that they perform better than other well-established neural network mode.Peer ReviewedPostprint (published version
- …